All Questions
Tagged with pythonweb-scraping
30,100 questions
-4votes
0answers
34views
Python Selenium FireFox (Geckodriver) - Script runs on Windows but fails on Linux server (TimeoutError) EXE or python script both are not working
I have written a Python script for web scraping using Selenium with Firefox (Geckodriver). The script runs perfectly on Windows, but when I run it on Linux — either as a Python script or packaged as ...
-2votes
1answer
71views
Issues with Automated Twitter Account Creation Bot in Python (Playwright) - Unable to Find "Authenticate" Button
I'm developing a bot in Python to automate the account creation process on Twitter (X) using Playwright, but I am consistently facing issues in certain steps, especially when trying to find and click ...
1vote
2answers
110views
Can't close cookie pop up on website with selenium webdriver
I am trying to use selenium to click the Accept all or Reject all button on a cookie pop up for the the website autotrader.co.uk, but I cannot get it to make the pop up disappear for some reason. This ...
0votes
0answers
35views
How to scrape tweet/thread and its replies based on conversation_id [closed]
I’m currently working on a project that involves scraping a single tweet and all its replies using tweet-harvest with an auth_token. Everything works fine, but I recently ran into an issue where I can ...
2votes
0answers
60views
python-requests-html render inconsistent result
background: by default the website is only showing few names and there s a "moreBtn" to generate the full list code idea: create Html session, render with script clicking the "moreBtn&...
-1votes
2answers
37views
Why am I getting no data using BeautifulSoup and requests when scraping a news website?
import requests from bs4 import BeautifulSoup url = "https://example-news-site.com" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" } response =...
-1votes
2answers
76views
I wanted to get the number of playoff games of a list of 200 players from Basketball Reference. The code I wrote is giving me 0 value for all players [closed]
I want to get the number of playoff games played by a list of players. To do that I used Selenium and Beautiful Soup. The result is being saved in a csv file but the values for each of the player is ...
1vote
2answers
51views
Importing geographic data with WFS works on Chrome but not on Python
I am trying to pull a geojson file from here. The JSON appears as expected when I paste that link into Chrome or Safari. However, I get the following error every time when I run the following code on ...
-3votes
0answers
27views
I tried to parse the page but always get duplicate texts [closed]
I can get the multiple pages parsed text. However, it will have duplicate paragraphs texts. For example, the content of the first page, it will be parsed 3 times in total. I use the python code and I ...
-1votes
0answers
72views
How to scrape the full New York Times article content using Selenium and BeautifulSoup without triggering the "Please enable JavaScript" message?
I'm building a scraper that fetches full article content from the New York Times using both the Article Search API and a hybrid static + Selenium-based HTML scraper. My goal is to extract complete ...
1vote
2answers
67views
How to detect and scrape a specific language version of a multilingual publication, if available?
I wrote a python script for scraping data from WHO website, I wanted to retrieve Title, author name, date, pdf link and child page link from parent page (i applied some filters on parent page) I am ...
-1votes
0answers
34views
Scrapy: "RuntimeError: Engine Not Running" when I try to run my spider after installing Scrapy-Playwright
Background: I just installed scrapy-playwright on my virtual environment in order to scrape a website that renders some links I need with Javascript. The installation went well, but when I ran my ...
0votes
0answers
55views
Crawl4AI token threshold not applied to raw html in arun
Here’s a brief overview of what I want to achieve Extract raw htmls and save them Use Crawl4AI to produce a ‘cleaner’ and smaller HTML that has a lot of information, including what I will eventually ...
-3votes
1answer
49views
How to switch to a popup cookie consent page?
I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver in Ubuntu 24.04. When I try to open an url, a cookie consent popup, asking to continue without accepting, accept and more options. How can I ...
0votes
0answers
56views
Extract span values using BS4
I'm trying to extract "Date Applied" and "17 Apr 2025 06:00", from html below: <span class="labels" part="text-and-icon-labels"> <slot part="...